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Coalescing compact binary systems consisting of neutron stars and/or black holes should be de- 
tectable with upcoming advanced gravitational- wave detectors such as LIGO, Virgo, GEO and KA- 
GRA. Gravitational-wave experiments to date have been riddled with non-Gaussian, non-stationary 
noise that makes it challenging to ascertain the significance of an event. A popular method to 
estimate significance is to time shift the events collected between detectors in order to establish 
a false coincidence rate. Here we propose a method for estimating the false alarm probability of 
events using variables commonly available to search candidates that does not rely on explicitly time 
shifting the events while still capturing the non-Gaussianity of the data. We present a method for 
establishing a statistical detection of events in the case where several silver-plated (3-5<r) events ex- 
ist but not necessarily any gold-plated (> 5<r) events. We use LIGO data and a simulated, realistic, 
blind signal population to test our method. 



I. INTRODUCTION 

Detecting the gravitational-waves (GWs) from coalesc- 
ing neutron stars and or black holes should be possible 
with advanced GW detectors such as LIGO, Virgo, GEO 
and KAGRA pQ. If the performance of past detectors 
is any indicator of the performance of future GW de- 
tectors, they are likely to be affected by non-Gaussian 
noise [2]. Coincident observations are crucial in validat- 
ing the detection of GWs but it is necessary to establish 
the probability that the coincident event could arise from 
noise alone. 

If the detectors' data were Gaussian and stationary, 
it would be straightforward to compute the false alarm 
probability (FAP) of a coincident event based solely on 
its signal-to-noise ratio (SNR) and the number of inde- 
pendent trials. With non-stationary, non-Gaussian data 
the SNR is not sufficient to describe the significance of 
an event and, furthermore, the distribution of detector 
noise is not known a priori. 

Estimating false-coincident backgrounds from time de- 
lay coincidence associated with searches for GWs was 
first proposed for targeted compact binary coalescence 
GW searches in [3]. This method has been the com- 
monest used in subsequent searches [il-TH]. We present 
a method to estimate the false alarm probability of a 
GW event from coalescing compact objects without time 
shifts by measuring the false alarm probability distribu- 
tions for non-coincident events using a set of common 
variables available to the searches. This greatly simpli- 
fies analysis and lends itself nicely to an online analysis 
environment. 
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This paper is organized as follows. In Sec. |TT] we de- 
scribe a formalism for ranking GW events and establish- 
ing the probability distribution for a given event's rank in 
noise. In Sec. |III| we present how to estimate the signifi- 
cance of a population of CBC events, which might include 
silver-plated (i.e. less than 5a) events. In Sec. |IV| wc 
test our method with a mock, advanced detector search 
that uses four days of LIGO fifth science run (S5) data 
that has been recolored to have an Advanced LIGO spec- 
trum containing a plausible, simulated, blind population 
of double neutron star binary mergers. We demonstrate 
that we can detect GWs from neutron star binaries with 
very low false alarm probability. 

II. METHOD 

GW searches for compact binary coalescence begin by 
matched filtering data in the detectors [T5]- If peaks in 
SNR times series for more than one detector are con- 
sistent with the light travel time between detectors and 
timing errors, these peaks are considered to be a coinci- 
dent event. 

GW data to date have not been stationary and Gaus- 
sian [2] thus making it difficult to model the noise in GW 
searches. Non-stationary noise degrades the effectiveness 
of standard matched filter searches. For that reason addi- 
tional signal consistency tests are often employed, such as 
explicit x 2 tests [TH [17] . Non-stationarity occurs on sev- 
eral timescales. Here we are more concerned with short 
duration non-stationary bursts of noise called glitches for 
which \ 2 tests are very useful discriminators. 

In this section we will present a method using common 
variables available to a compact binary search to estimate 
the FAP without relying on time shifting the detector 
data. Although many variables and measurements may 
be used, in this paper we consider two parameters: the 
matched filter SNR pi and the x 2 statistic xh which 
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depend on the detector i, as well as parameters intrinsic 
to the source that the template describes such as mass 
and spin, 9. In this section, we introduce the framework 
for evaluating the FAP of GW candidates. 



independent between the detectors, 

D 



P{Ci,...C D ,e\n) = Y[P(Ci,9\n), 



(4) 



A. Ranking events 

Here is our concise definition of a coincident gravita- 
tional wave search for compact binary sources, i) The 
search consists of D detectors, ii) We seek to find the 
significance of an event found in the D detectors local- 
ized in time, iii) The intrinsic parameters of the event 
will be unknown a priori. Our detection pipeline will 
measure the significance as a function of the parameters 
of the template waveform 9 

For each detector i of a D detector network we use pi 
and xf to rank candidates with parameters 9 from least 
likely to be a gravitational wave to most likely. We use 
a standard likelihood ratio [H] defined as 



£{Pi,Xi,---Pd,Xl»Q) = 



P(pi,xI,---Pd,X 2 d,6_\s) 

P{pl,Xl, ■ ■ ■ PD,XD^\ n ) 



(1) 



where P{. . . \s) is the probability of observing (...) given 
a signal, and P(. . . |n) is the probability of observing (...) 
given noise. It is assumed that the signal distribution has 
been marginalized over all relevant parameters and the 
9 refers only to the template waveform parameters that 
are measured by the pipeline. We make the simplifying 
assumption [T5] that the likelihood can be factored into 
products of likelihoods from individual detectors, 



D 



£(Pi,xI,---Pd,Xd, 



Xl~0). (2) 



The simplification that the likelihood function can be 
built from these products implies statistical indepen- 
dence between detectors for both signals and noise. This 
results in a suboptimal ranking statistic. However, we 
can compute the FAP associated with this statistic, and 
in fact, it becomes much easier to do so. 



B. Computing the FAP 

The FAP is the probability of measuring a given C if 
the data contains only noise. N.B., this is not the same 
as assessing the probability that the data contains only 
noise, which requires knowing the prior probabilities of 
both signal and noise. In constructing the FAP, P(£\n), 
we start with 



P(C,9\n)= / P(C 1 ,...C D ,e\n)& J - 1 Y. 



(3) 



where £ is the surface of constant C — Ylf A • From ^ , 
we have, assuming that the likelihood values in noise are 



where P(Ci,9\n) is obtained by marginalizing over pi, 
and xf m the single-detector terms, 



P(£i,0\n)= / P( Pl ,xlO\n)da, 



(5) 



where a is the contour of constant Ci in the {pi, xf} 
surface at constant 9. Implicit in Q and Q is the as- 
sumption that the coincidence criteria do not depend on 
Pi, xf or 9. Finally, P{C\n) is obtained by marginalizing 
over 



P(C\n) = / P(C,9\n)d9. 



(6) 



The probability of observing an event with a likelihood 
value at least as large as some threshold C* is 



P{C\n) d£. 



(7) 



A GW search will typically produce multiple coincident 
events during a given experiment. That means that there 
will be multiple opportunities to produce an event with 
a certain likelihood value. We are ultimately interested 
in the probability of getting one or more events with C > 
C* after all the events are considered. The probability of 
getting at least one such event after forming M indepen- 
dent coincidences [20] can be adjusted by the complement 
of the binomial distribution 



P{C*\n 1 ,...,n M ) :=1- 



P(C*\n)°(l-P(C*\n)) 



A I 



1-(1-P(£») 



(8) 



This is the FAP at C* in an experiment that yielded 
M coincident events. In what follows, we will drop the 
explicit ni, . . . ,Um notation and simply use n where it is 
assumed that we have corrected for the number of trials. 



III. GW EVENTS AS A POISSON 
DISTRIBUTION 

Historically, GW experiments have used rates to rank 
events [TDHHj- Assuming that the likelihood function 
is independent of time over the duration of the experi- 
ment (or can be approximated as such) we can cluster 
the most significant events of the search over a duration 
longer than the correlation induced by the filter and we 
might expect the events arising from noise to obey Pois- 
son statistics. In what follows we assume that in fact 
this is the case and connect our estimation of FAP with 
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the false alarm rate (FAR) often quoted in gravitational 
wave searches for compact binary coalescence. 

For a Poisson process with mean A, the probability 
of observing N or more events is given by the survival 
function 



N-l 



A' 



P(N\\) = l-e-*J2 1 



(9) 



than 1. There are iV max choices of population possible, so 
we incur that cost from the number of trials, modifying 
the FAP in JT21 to 



P(N\N max ) = 1 - (1 - P(N\X(C* N ))) 



IV. EXAMPLE 



(13) 



Using P{C*) from Q, setting N = 1, and solving for A 
tells us the mean number of noise events with C > C* , 



A(£*) = -ln [1-P(C*)]. 



(10) 



The quantity, inverse false alarm rate, is given by IFAR = 
T/A, where T is the observation time of the experiment. 

If C* N is the likelihood of the iV th most significant 
event, the number of background events expected with 
C > C* N is 



X(C* N ) = - In [1-P(C 



N I 



(11) 



Since we observed TV events with C > C* N , the probability 
of having produced at least this many events is found by 
substituting (111 into (J9J, 



P(N\\{C* N )) 



i=0 



(12) 



A population of events can collectively be more signif- 
icant than the single most significant event alone. In- 
deed, population analyses have previously been employed 
in looking for GW signals associated with gamma-ray 
bursts (GRBs). For example, a Student-T test was pro- 
posed in [5T] to test for deviations in the cross-correlation 
of detectors' output preceding a set of times associated 
with GRBs (i.e., on-source times) when compared to 
other off-source times not associated with GRBs, a bi- 
nomial test was employed in [521 123] using the X% most 
significant events to test for excess numbers of events at 
their associated FAPs, a Kolmogorov test was used in [24] 
to look for deviations from isotropy in GRB direction 
based on the directional sensitivity of the bar detectors, 
and a Mann- Whitney U (or Mann- Whitney- Wilcoxon) 
test was performed in [25] to test if the all the FAPs as- 
sociated with the on-source events of the GRBs were on 
average smaller than the expected distribution given by 
the off-source events, as would be the case if the average 
significance were elevated due to the presence of GWs in 
the on-source events. 

As noted in [HI [23], seeking significance by considering 
different choices of population diminishes the significance 
of each on account of the trials that have been conducted. 
We control this by restricting ourselves to considering 
only populations consisting of contiguous sets of events 
that include the most significant, and are limited to a 
maximum size -/V max where iV max is the rank of the most 
significant event at whose ranking statistic (IFAR) value 
the expected number of background events was greater 



We have applied these techniques to a mock search 
for GWs from binary neutron stars in four days of S5 
LIGO data that has been recolored to match the Ad- 
vanced LIGO design spectrum |2S]!2Z]- This provides a 
potentially realistic data set that contains glitches from 
the original LIGO instruments. A population of neutron 
star binaries was added at a rate of 4 / Mpc 3 / Myr, (see 
PQ for the expected rates.) We self-blinded the signal 
parameters with a random number generator. 

Our analysis targeted compact binary systems with 
component masses between 1.2 and 2 Mq. We used 
3.5 post-Newtonian order stationary phase approxima- 
tion templates to cover the parameter space with a 97% 
minimal match [35] by neglecting the effects of spin in the 
waveform models [22]. This required ^15,000 templates. 
We started the matched filter integrals at 15 Hz and ex- 
tended the integral to the innermost stable circular orbit 
frequency. The analysis gathered the data, whitened it, 
filtered it, identified events in the single detectors, found 
coincidences and ranked the events by their joint likeli- 
hoods. The filtering algorithm is described in [50] . 

The previous section described our method for estimat- 
ing the significance of events but did not describe many 
details of how the calculation is done in practice. We will 
point out a few of those details now. 

The numerator of ([!]) is evaluated by assuming the sig- 
nals follow their expected distribution in Gaussian noise. 
We note that this is a reasonable assumption because 
detections are likely to come from periods of relatively 
stationary and Gaussian data. Note that the expecta- 
tion for p can be obtained by assuming that sources are 
distributed uniformly in space. The expectation for the 
X 2 of a signal can be found in [T5] . 

The denominator of ([!]) is found by explicitly his- 
togramming the single detector events that are not found 
in coincidence. By excluding coincident events we lower 
the chance that a gravitational wave will bias the noise 
distribution of the likelihoods. In general the histogram- 
ming will suffer from finite statistics and "edge" effects. 
We generate the histograms at a finer resolution than 
required to track the likelihood and then apply a Gaus- 
sian smoothing kernel with a width characteristic of the 
uncertainty in p. 

We are unable to collect enough statistics to fully re- 
solve the tail of the background p distribution. Thus, 
we add a prior distribution into the background statis- 
tics that models the p falloff as expected from a 2 de- 
gree of freedom matched filter in Gaussian noise, i.e. 
p(p\n) cx exp [— p 2 /2]. This helps ensure that the like- 
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lihood contours increase as a function of p at large p. At 
some point the probability of getting a given value of p, 
X 2 becomes smaller than double precision float epsilon. 
We extend the background distribution above a given 
value of p with a polynomial in p that falls off faster 
than the signal distribution (which is oc p~ 4 ) but is shal- 
low enough to prevent numerical problems. In both cases 
the point of the prior is not to influence the ranking of 
typical events but rather to make the calculations more 
numerically well-behaved. The prior is added so that the 
total probability amounts only to a single event in each 
detector. Thus the background (as billions of events are 
collected) quickly overwhelms the prior except for at the 
edges where there is no data. The point where the calcu- 
lation is no longer based on having at least 1 actual event 
in background is important since it will effectively mark 
the limiting FAP. More discussion of that point follows. 

In Fig. [I] we show some of the intermediate data used 
in estimating the significance of events in our example. 
Namely, we show the individual likelihood contours for 
p and x 2 described in ([I]) in the HI and LI instruments 
for signals with a chirp mass consistent with a neutron 
star binary (1.2 M®) in Subfig. |l(a) and 1(b) respec- 
tively. The probability of getting an event with a like- 
lihood greater than C* after M trials for the HI and 
LI instruments (131 is shown in Fig. |l(c) Our ability 
to measure P(£*\n) is limited by the number of events 
that we collect in our background estimate. The shaded 
region shows the yN error region found by assuming 
Poisson errors on the number of events that went into 
computing a given point on the curve. We have indi- 
cated the FAP at which there ceases to be more than 1 
event collected in the background by a dashed line. The 
dashed line shows the P(C*\n) has background events to 
V := 7 x 10~ 5 which is nearly the FAP required for a 
4cr detection. Below the dashed line the FAP estimate 
is dominated by the Gaussian smoothing kernel applied 
to the planes in Figs. |l(a) and 1(b) We believe that 



it is reasonable to trust the FAP estimate beyond the 
single background event limit but note that 5a level con- 
fidence can still be reached without extrapolation with 
tighter coincidence criteria. Tighter coincidence criteria 
would reduce the trials factor and permit higher signif- 
icances to be estimated. The best way to do this is to 
demand that three or more detectors see an event. In 
our example a third detector would lower the trials fac- 
tor by ~ 100, which would shift the limiting FAP, V to 
~ 7 x 10~ 7 . It is worth mentioning that the background 
events and number of independent trials are accumulated 
at the same rate. Thus one cannot decrease the limiting 
FAP by collecting more data. 

After assigning the FAP to events we also assign a FAR 
according to (10). This allows us to produce the stan- 



dard IFAR plot commonly produced in recent searches 
for compact binaries [10l [14] without having relied on 
time shifting the detector ev ents t o estimate the back- 
ground. This is shown in Fig. 2(a)| 

The IFARs of the most significant events that came out 
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FIG. 1. Figure 1(a) and 1(b) show the likelihoods Chi, £li as 
a function of p and \ 2 for HI and LI respectively for templates 
with masses consistent with neutron star binaries (1.2-2 M©.) 
£m, £li appear as the right-hand-side of (|2|). Lighter col- 



ors refer to higher likelihood values. Figure 1(c) shows the 
probability of having obtained a given value of likelihood C* 
or greater from noise as defined in (|8| after M trials (where 
M is the number of independent coincidences formed. In this 
example M = 6 x 10 4 .) 
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though somewhat artificial, could still play an important 
role in analysis, especially if one is unable to confidently 
declare a single 5a event but finds two or more events 
with 3 or 4a. With our example analysis the combina- 
tion of the two loudest events was a 5a excursion even 
after restricting the FAP of both events to be V . After 
examining the signal population we found that both can- 
didates were separately associated with signal injections. 



V. CONCLUSION 

We have provided a method for estimating the signif- 
icance of GWs from compact binary coalescence using 
measurements of single instrument populations of p and 
X 2 as a function of the template waveform intrinsic pa- 
rameters. We demonstrated our method with mock Ad- 
vanced LIGO data derived from initial LIGO data includ- 
ing a realistic population of compact binary merger sig- 
nals and glitches. We found that between our two loudest 
events we were able to establish detection at greater than 
5cr confidence. Both of the loudest two events exhausted 
the V (~ 4a) background estimate, but the extrapolated 
FAP of the loudest event exceeded 5a on its own. Both of 
the loudest events were associated with the blind signal 
population introduced into the data and the remaining 
events were consistent with the expectation from back- 
ground. 



FIG. 2. Fig. 2(a) is a standard IFAR plot where the shaded re- 
gions correspond to the "la" through "7a" regions computed 
using the survival function and point percent function associ- 
ated with the Poisson distribution. This is used to determine 
where to stop the accumulation of events for the population 
statement. Fig. 2(b) shows the FAP associated with each of 
the individual events in the population we are considering as 
well as the FAP of obtaining the running TV loudest events 
without restricting the FAP to be greater than V — 7 x 10~ 5 . 
Also shown are the same traces obtained after restricting the 
FAPs to be greater than V '. 



of this search in Fig. 2(a) can be identified as the long 
tail in the observed events distribution. The top event 
has a significance greater than 5a, the level necessary for 
claiming the detection of GWs. The second loudest event 
has a significance greater than 4a. Both events surpass 
the single background event limit V . If restricted to this 
limit then both events are nearly 4a. 

Applying the population procedure we have put forth 
in Sec. |III[ we produced a more significant statement 
about the presence of GWs beyond that of the loudest 
event. This effect is mostly attributed to the similar sig- 
nificance of the top two events. This could happen in a 
real analysis in two ways 1) Nature could just provide 
such a set of events as in this example 2) both events 
exhaust our ability to measure significance and we must 
place an upper bound on the FAP. The latter case, al- 
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Appendix A: Numerical Considerations 
1. Equation Q 

As the duration of the experiment increases, the nu- 
merical evaluation of ^ using fixed-precision floating 
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point numbers becomes challenging. In this limit, the 
per-trial false-alarm probability of interesting events is 
very small and the number of trials is very large. Using 
double-precision floating-point numbers, when the num- 
ber of trials gets larger than about 10 10 , FAPs of 10~ 6 
and become indistinguishable, and as the number of co- 
incidences that are recorded increases further "4er" and 
"5c" events cannot be differentiated — it is no longer 
possible to make detection claims. The following pro- 
cedure can be used to evaluate ^ for all P(C*\n) and 
M. If MP(C*\n) < 1 the Taylor expansion of @ about 
P(£*\n) = converges quickly. 



p2 

T + 

IT 

p4 

24 



1 - (1 - P) M = MP - (M 2 - M) 
(M 3 - 3M 2 + 2M) 
(M 4 - 6M 3 + HAf 2 - 6M) 

= E - ll T[ZY)\ K M - °)( M - J ) ■ ■ ■ ( M ~ *)] • (Al) 

The last form yields a recursion relation allowing subse- 
quent terms in the series to be computed without explicit 
evaluation of the numerator and denominator separately 
(which, otherwise, would quickly overflow): if the (i— l)th 
term is X, the ith term in the series is X^^fP. 

' 2+1 

If MP(C*\n) > 1 the Taylor series still converges (in- 
fact, as long as the number of trials M is an integer the 
series is exact in a finite number of terms) but the series 
is numerically unstable: the terms alternate sign and one 
must rely on careful cancellation of large numbers to ob- 
tain an accurate result. In this regime the expression's 
value is close to 1, so (1 — P) M is small. If P is small, we 
can write 



1 - (1 - P) M = 1 - c 



a Mln(l-F) 



and then the Taylor expansion of M ln(l — P) about P = 
converges quickly, 



Mln(l-P) = -MP 1 



P P 1 



(A2b) 



MP(C*\n) < 1 use (All computed via the recursion re- 



lation; otherwise if P(C*\n) < 0.125 use (A2); otherwise 
evaluate ([8]) directly using normal floating point opera- 
tions. The threshold of P(C*\n) < 0.125 for using ( |A2j ) 
is found empirically, the results are not sensitive to the 
choice of this number. 



2. Equation (10 1 



The evaluation of ( 10 ) for events that are interesting 



as detection candidates after an experiment is concluded 
is straight-forward using double-precision floating-point 
arithmetic. In this regime, P(C*\n) ~ 10~ 5 , and there 
is plenty of numerical dynamic range available. How- 
ever, the practical use of ( 10 1 is in its ability to identify 



"once a day" or "once an hour" events for the purpose of 
providing alerts to the transient astronomy community. 
After just one day, 24 "once an hour" background events 
are expected, and their FAP — the probability of observ- 
ing at least one such event from a Poisson process you 
expect to have produced 24 — is 0.9999999999622486. 
After 37 events are expected, double-precision numbers 
can no longer be used to differentiate those events' FAPs 
from 1; that is, (10 1 can only assign reliable false-alarm 



rates to the 30 or so most significant background events 
in any experiment. 

This problem is addressed by not computing the ex- 
pected number of events, X(C*), from the false-alarm 



probability, P(C*), as shown in (10 1, but by first going 
back and rewriting ^ and (|Sj) as 



l-P{£*\n 1 ,... 1 n M ) = 




M 



P{C\n) dC 



(A3) 



(A2a) from which we can rewrite ( 10 1 as 



A(£*) = -Mln f P(£\n)d£ 
Jo 



(A4) 



Altogether, the algorithm for evaluating (TS|) is: if 



This form of the expression presents no challenges to 
its evaluation using double-precision floating point arith- 
metic. 
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